Time Traveling Through Trade: Visualizing Temporal Patterns to Expose Illegal Fishing
**
Author
Abhishek Singh
Published
May 28, 2023
Modified
June 18, 2023
Overview
FishEye International is a non-profit organization that counters illegal, unreported, and unregulated (IUU) fishing activities. They have recently obtained access to a comprehensive database from an international finance corporation, detailing fishing-related businesses. The database, converted into a knowledge graph, carries valuable information about the companies, their owners, employees, and financial conditions. Traditionally, analysts at FishEye have attempted to uncover business anomalies using standard graph analyses and node-link visualizations. However, the intricate and vast scale of the data has made it challenging to discern the true structure of businesses. Consequently, a more effective visual analytics approach is urgently needed to help identify anomalous companies potentially involved in IUU. This analysis aims to provide a detailed understanding of patterns for entities and their activities over time
Objective
The primary goal of this assignment is to devise a new approach that can efficiently process the large and detailed knowledge graph data to identify anomalies in fishing businesses. This approach should allow us to spot irregular patterns, uncover hidden relationships, and reveal potential IUU-involved companies. By accomplishing this objective, we aim to significantly improve FishEye International’s ability to identify, monitor, and counteract IUU fishing activities.
My TASK
Use visual analytics to identify anomalies in the business groups present in the knowledge graph. Limit your response to 400 words and 5 images.
The code chunk uses pacman::p_load() to check if packages are installed. If they are, they will be launched into R. The packages used are
jsonlite: It is used for working with JSON data in R, providing functions to parse JSON and convert it to data frames.
igraph : It offers a wide range of graph algorithms and visualization capabilities
tidygraph: An interface for manipulating and analyzing graphs using the principles of tidy data
ggraph: It allows for creating aesthetically pleasing and customizable graph visualizations.
lubridate: It is a package for working with dates and times in R.
ggiraph: used for interactive features such as tooltips, zooming, and panning. It is particularly useful for creating interactive web-based visualizations.
hrbrthemes: It provides additional themes and styling options
treemap: This package offers functions to create treemaps
plotly: Used for creating interactive web-based graphs.
ggstatsplot: Used for creating graphics with details from statistical tests.
graphlayouts: provides various graph layout algorithms for arranging the nodes and edges of a graph in a visually appealing manner.
knitr: Used for dynamic report generation
ggdist: Used for visualising distribution and uncertainty
ggthemes: Provide additional themes for ggplot2
tidyverse: A collection of core packages designed for data science, used extensively for data preparation and wrangling.
rstatix: used for data manipulation, summarization, and group-wise comparisons
Hmisc : used to compute descriptive statistics for a variable in a dataset
DT : DataTables that create interactive table on html page.
summarytools- used for creating summary statistics and tables for data exploration and reporting
kableExtra- is used for creating tables in various output formats, such as HTML, PDF, or Word documents.
ggplot2- provides a flexible and layered approach to create a wide variety of high-quality static and interactive plots.
summarytools- used for creating summary statistics and tables for data exploration and reporting
All packages can be found within CRAN.
pacman::p_load() function from the pacman package is used in the following code chunk to install and call the libraries of multiple R packages:
1.2 Importing data sets
In the code chunk below , fromJSON() of jsonlite package is used to import MC3.json into R environment. The output is called mc3. It is a large list R object.
MC3_Edges
4 Variables 24036 Observations
--------------------------------------------------------------------------------
source
n missing distinct
24036 0 12856
lowest : 1 and Sagl Forwading 1 AS Marine sanctuary 1 Ltd. Liability Co Cargo 1 S.A. de C.V. 2 Limited Liability Company
highest: zūn yú GmbH & Co. KG Creek zūn yú N.V. Shipping zūn yú S.A. de C.V. Zuniga-Young Zuniga and Sons
--------------------------------------------------------------------------------
target
n missing distinct
24036 0 21265
lowest : Aaron Adams Aaron Adkins Aaron Allen Aaron Alvarez Aaron Baker
highest: Zachary York Zachary Young Zoe Allen Zoe Marsh Zoe Smith
--------------------------------------------------------------------------------
type
n missing distinct
24036 0 2
Value Beneficial Owner Company Contacts
Frequency 16792 7244
Proportion 0.699 0.301
--------------------------------------------------------------------------------
weights
n missing distinct Info Mean Gmd
24036 0 1 0 1 0
Value 1
Frequency 24036
Proportion 1
--------------------------------------------------------------------------------
Checking Missing Values:
A Glimpse into the Code
colSums(is.na(MC3_Edges))
source target type weights
0 0 0 0
Checking Duplicates
A Glimpse into the Code
any(duplicated(MC3_Edges))
[1] FALSE
The dataset comprises of an undirected multi-graph with 27,622 nodes and 24,038 edges.
It contains 7,794 connected components.
The graph is undirected, implying that relationships or interactions do not have a specific direction or order. In other words, if there is a connection between two nodes, it applies both ways.
Edge Attributes:
type: This attribute represents the type or nature of the relationship or interaction between the nodes connected by the edge.
source: This is the ID of the source node. It identifies where the relationship or interaction originates from in the network.
target: This is the ID of the target node. It identifies where the relationship or interaction is directed towards in the network.
role: This provides a more specific classification of the relationship or interaction represented by the edge, like beneficial owner or company contacts.
A Glimpse into the Code
MC3_Edges_count <- MC3_Edges %>%group_by(type) %>%summarise(n =n())p <-ggplot(data = MC3_Edges_count, aes(x = type, y = n, fill = type)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label = n), vjust =-0.5) +scale_fill_brewer(palette ="Set2") +theme_minimal() +theme(plot.background =element_rect(fill ="seashell"),panel.grid.major =element_line(color ="grey80"),panel.grid.minor =element_blank(),legend.position ="top",text =element_text(size =12, face ="bold"),plot.title =element_text(hjust =0.5)) +labs(x ="Type", y ="Count", fill ="Type",title ="Distribution of Edge Types")ggplotly(p)
kable(head(MC3_Nodes), format ="html", caption ="NODES")
NODES
id
country
type
revenue_omu
product_services
Jones LLC
ZH
Company
310612303
Automobiles
Coleman, Hall and Lopez
ZH
Company
162734684
Passenger cars, trucks, vans, and buses
Aqua Advancements Sashimi SE Express
Oceanus
Company
115004667
Holding firm whose subsidiaries are engaged in the businesses of refining and chemicals, process and pollution control equipment, minerals, fertilizers, polymers and fibers, commodity trading and services, forest and consumer products, and ranching
Makumba Ltd. Liability Co
Utoporiana
Company
90986413
Car service, car parts and accessories, automotive technology, diagnostics for repair shops, antilock braking and fuel-injection systems, auto electronics, starters, and alternators; Home (power tools for DIY enthusiasts, garden tools, household appliances, heating and warm water); and industry and trade (communication services, power tools for professional, sensors and foundry - MEMS, security systems, packaging technology)
Taylor, Taylor and Farrell
ZH
Company
81466667
Fully electric vehicles (EVs) and electric vehicle powertrain components
Harmon, Edwards and Bates
ZH
Company
75070435
Discount supermarket; Variety of food and non-food products
Note
mutate() and as.character() are used to convert the field data type from list to character.
To convert revenue_omu from list data type to numeric data type, we need to convert the values into character first by using as.character(). Then, as.numeric() will be used to convert them into numeric data type.
select() is used to re-organise the order of the fields.
A Glimpse into the Code
skim(MC3_Nodes)
Data summary
Name
MC3_Nodes
Number of rows
27622
Number of columns
5
_______________________
Column type frequency:
character
4
numeric
1
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
id
0
1
6
64
0
22929
0
country
0
1
2
15
0
100
0
type
0
1
7
16
0
3
0
product_services
0
1
4
1737
0
3244
0
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
revenue_omu
21515
0.22
1822155
18184433
3652.23
7676.36
16210.68
48327.66
310612303
▇▁▁▁▁
A Glimpse into the Code
str(MC3_Nodes)
tibble [27,622 × 5] (S3: tbl_df/tbl/data.frame)
$ id : chr [1:27622] "Jones LLC" "Coleman, Hall and Lopez" "Aqua Advancements Sashimi SE Express" "Makumba Ltd. Liability Co" ...
$ country : chr [1:27622] "ZH" "ZH" "Oceanus" "Utoporiana" ...
$ type : chr [1:27622] "Company" "Company" "Company" "Company" ...
$ revenue_omu : num [1:27622] 3.11e+08 1.63e+08 1.15e+08 9.10e+07 8.15e+07 ...
$ product_services: chr [1:27622] "Automobiles" "Passenger cars, trucks, vans, and buses" "Holding firm whose subsidiaries are engaged in the businesses of refining and chemicals, process and pollution "| __truncated__ "Car service, car parts and accessories, automotive technology, diagnostics for repair shops, antilock braking a"| __truncated__ ...
MC3_Nodes
5 Variables 27622 Observations
--------------------------------------------------------------------------------
id
n missing distinct
27622 0 22929
lowest : 1 and Sagl Forwading 1 AS Marine sanctuary 1 Eel Corporation Transport 1 Ltd. Corporation Transport 1 Ltd. Liability Co
highest: Zuniga Inc Zuniga Ltd Zuniga PLC Zuniga, Burgess and Davenport Zuniga, Logan and Newton
--------------------------------------------------------------------------------
country
n missing distinct
27622 0 100
lowest : Afarivaria Alverossia Alverovia Andenovia Anderia del Mar
highest: Wysterion Yggdrasonia Zambarka Zawalinda ZH
--------------------------------------------------------------------------------
type
n missing distinct
27622 0 3
Value Beneficial Owner Company Company Contacts
Frequency 11949 8639 7034
Proportion 0.433 0.313 0.255
--------------------------------------------------------------------------------
revenue_omu
n missing distinct Info Mean Gmd .05 .10
6107 21515 4637 1 1822155 3574819 4915 5243
.25 .50 .75 .90 .95
7676 16211 48328 190919 612716
lowest : 3.652227e+03 4.657797e+03 4.660665e+03 4.666673e+03 4.666703e+03
highest: 2.914265e+08 2.929701e+08 3.049959e+08 3.082496e+08 3.106123e+08
--------------------------------------------------------------------------------
product_services
n missing distinct
27622 0 3244
lowest : (Italian) peeled tomatoes, legumes, vegetables, fruits and canned mushrooms 100 percent Spanish olives; peppers, green, black, and manzanilla stuffed olives; anchovies-stuffed olives; and black olives; Olives recipes 2 or 3-piece containers, twist off caps, easy opening and traditional caps; Cutting; varnishing and metal plate lithography 8 Cement Mixer Units, Ocean Freight, Air Freight, Project Logistics, Continental Container Line CCL, Atlantic Pacific Container line APL, Project Arabia Line PAL source: freelance researcher A chemical science firm with a focus on the development of high purity, high performance products and services
highest: Young soybeans in pods; and spring rolls includes shrimp mini spring rolls with shiitake mushroom, vegetable mini spring rolls with shiitake mushroom, and all natural pre-fried vegetable mini spring rolls with shiitake mushroom Zinc and aluminum die cast hardware and components Zinc and aluminum die cast parts Zinc metal Zumba clothing and accessories
--------------------------------------------------------------------------------
Checking Missing Values:
A Glimpse into the Code
colSums(is.na(MC3_Nodes))
id country type revenue_omu
0 0 0 21515
product_services
0
Checking Duplicates
A Glimpse into the Code
any(duplicated(MC3_Nodes))
[1] TRUE
The dataset comprises of an undirected multi-graph with 27,622 nodes and 24,038 edges.
It contains 7,794 connected components.
The graph is undirected, implying that relationships or interactions do not have a specific direction or order. In other words, if there is a connection between two nodes, it applies both ways.
Node Attributes:
type: The classification or category of the node. This can indicate the nature of the entity, such as company, owner, or worker.
country: This attribute represents the country associated with the node. This can be either a full country name or a two-letter country code.
product_services: This provides a description of the products or services associated with the node. This can help in understanding the node’s role in the network.
revenue_omu: This is the operating revenue of the node in Oceanus Monetary Units (OMU). It gives a measure of the financial size or activity of the node.
id: This is the unique identifier of the node. This ID is also the name of the entity it represents.
role: This is a subset of the “type” attribute, providing more detailed classification of the node. It includes roles like beneficial owner or company contacts.
#3 Data Visualization
A Glimpse into the Code
MC3_Nodes_count <- MC3_Nodes %>%group_by(type) %>%summarise(n =n())p <-ggplot(data = MC3_Nodes_count, aes(x = type, y = n, fill = type)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label = n), vjust =-0.5) +scale_fill_brewer(palette ="Set2") +theme_minimal() +theme(plot.background =element_rect(fill ="seashell"),panel.grid.major =element_line(color ="grey80"),panel.grid.minor =element_blank(),legend.position ="top",text =element_text(size =12, face ="bold"),plot.title =element_text(hjust =0.5)) +labs(x ="Type", y ="Count", fill ="Type",title ="Distribution of Edge Types")ggplotly(p)
2. Visualization
2.1 Top 10 Countries with Highest revenue
A Glimpse into the Code
# Group the data by country and calculate the total revenuetop_countries <- MC3_Nodes %>%group_by(country) %>%summarise(total_revenue =sum(revenue_omu, na.rm =TRUE)) %>%arrange(desc(total_revenue)) %>%head(10)# Plot the top 10 countries by total revenuep <-ggplot(data = top_countries, aes(x =reorder(country, -total_revenue), y = total_revenue)) +geom_bar(stat ="identity") +#geom_text(aes(label = round(total_revenue)), vjust = -0.5) +scale_fill_brewer(palette ="Set2") +theme_minimal() +theme(plot.background =element_rect(fill ="seashell"),panel.grid.major =element_line(color ="grey80"),panel.grid.minor =element_blank(),legend.position ="top",text =element_text(size =12, face ="bold"),plot.title =element_text(hjust =0.5)) +theme(axis.text.x =element_text(angle =45, hjust =1))+labs(x ="Country", y ="Total Revenue (OMU)", fill ="Country",title ="Top 10 Countries by Total Revenue")# Convert ggplot object to a plotly object for interactivityp_interactive <-ggplotly(p)p_interactive
2.2 Number of Edges (connections) per Node
A Glimpse into the Code
# fuction from igraph-> graph_from_data_frameg <-graph_from_data_frame(MC3_Edges, directed =FALSE)# Calculation of degreesnode_degrees <-degree(g)# Converting to dataframedf_degrees <-data.frame(node =names(node_degrees), degree = node_degrees)# Histogramp <-ggplot(df_degrees, aes(x = degree)) +geom_histogram(binwidth =1, fill ="steelblue", color ="white") +xlim(0, 6) +theme_minimal() +labs(x ="Degree", y ="Count", title ="Distribution of Node Degrees")gp <-ggplotly(p)gp
Note
The majority of the nodes in the network graph have a degree of 1. This means that most entities in the network only have one connection with other entities. A count of 29,229 signifies a substantial proportion of the total nodes.
As the degree increases, the number of nodes that hold that degree decreases substantially. This trend signifies that it’s less common for entities to have multiple connections in the network. Nodes with a degree of 2 are 2,526. This number is significantly less than those with a degree of 1, indicating that fewer entities have two connections.
Further decrease is observed for nodes with degrees 3, 4, and 5, having counts of 1,100, 447, and 257 respectively. This consistent decline suggests that entities with many connections are quite rare in this network.
Lastly, entities with a degree of 5 are the rarest in the network. It may indicate highly connected entities or potential hubs in the network. Overall, the degree distribution of this network suggests a sparse and potentially disconnected network structure, which might present challenges in identifying broad structural anomalies. However, it also helps highlight entities with higher degrees as potential points of interest.
2.3 Proportion of Nodes in each ‘Country’
A Glimpse into the Code
# Calculating the number of nodes in each countrycountry_nodes <- MC3_Nodes %>%count(country) %>%arrange(desc(n)) %>%head(10)p1 <-ggplot(country_nodes, aes(reorder(country, -n), n)) +geom_bar(stat ="identity", fill ="steelblue") +labs(title ="Top 10 Countries by Node Count", x ="Country", y ="Node Count") +coord_flip() +theme_minimal() +theme(plot.background =element_rect(fill ="seashell"),panel.grid.major =element_line(color ="grey80"),panel.grid.minor =element_blank(),legend.position ="top",text =element_text(size =12, face ="bold"),plot.title =element_text(hjust =0.5)) gp1 <-ggplotly(p1)gp1
Note
The country with the most nodes in the graph is ZH, accounting for 22,439 nodes. This significant concentration indicates that ZH is a major player within the network and likely plays a crucial role in the industry.
The second most represented country is Oceanus, with 2,143 nodes. While this is considerably less than ZH, it still represents a substantial number of nodes and suggests that Oceanus also holds a significant position within the network.
The third most represented country is Marebak, with 742 nodes. Despite having less than a third of the nodes compared to Oceanus and a considerably smaller number compared to ZH, Marebak still has a noteworthy presence within the network.
Overall, these results suggest a significant concentration of nodes within a few countries, specifically ZH, Oceanus, and Marebak. This could potentially indicate centralization of activities within these regions. Future investigations could help in understanding what specific roles these countries play in the network, and how their large presence may impact the dynamics of the entire network.
top_5 <- MC3_Nodes %>%group_by(country) %>%summarise(total_revenue =sum(revenue_omu, na.rm =TRUE)) %>%arrange(desc(total_revenue)) %>%head(5)# Filteringtop_countries_5 <- MC3_Nodes[MC3_Nodes$country %in% top_5$country, ]# Grouping by country and company, and calculating total revenue per companytop_countries_5 <- top_countries_5 %>%filter(type=="Company") %>%group_by(country, id) %>%summarise(company_revenue =sum(revenue_omu, na.rm =TRUE), .groups ="drop") %>%arrange(country, desc(company_revenue))# For each country, keep only the company with the highest revenuetop_countries_5 <- top_countries_5 %>%group_by(country) %>%slice_max(order_by = company_revenue, n =5)treemap(top_countries_5,index =c("country", "id"),vSize ="company_revenue",vColor ="company_revenue",palette ="Paired",border.lwds =2,border.col ="white",title ="Top Companies by Revenue in Top 5 Countries",fontsize.labels =c(14, 10),fontfamily.labels ="Arial", fontcolor.labels =c("white", "black"),align.labels =list(c("center", "center"),c("left", "top") ), position.legend ="bottom")
Note
Below treemap provides a visual representation of the companies that generate the highest revenues in their respective countries.The top 5 countries were selected based on total revenue. In these selected countries, companies were further sorted and the top revenue-generating companies were identified.
The findings from the treemap are as follows:
The majority of the highest revenue-generating companies are registered in the country labeled as ‘ZH’.
Among these, the top 3 companies in terms of revenue generated have been identified as ‘Jones LLC’, ‘Patton Ltd’, and ‘Ramirez,Gallaghar and Jhonson’ Group.
The dataset also indicates that in the ‘Utoporiana’ and ‘Oceanus’ countries, the ‘Assam Limited Liability Company’ and ‘Aqua Advancements Sashimi SE Express’ are the top revenue earners respectively.
2.5 Tokenization
Calculating number of times the word fish appeared in the field product_services.
# A tibble: 27,622 × 6
id country type revenue_omu product_services n_fish
<chr> <chr> <chr> <dbl> <chr> <int>
1 Jones LLC ZH Comp… 310612303. Automobiles 0
2 Coleman, Hall and Lopez ZH Comp… 162734684. Passenger cars,… 0
3 Aqua Advancements Sashimi … Oceanus Comp… 115004667. Holding firm wh… 0
4 Makumba Ltd. Liability Co Utopor… Comp… 90986413. Car service, ca… 0
5 Taylor, Taylor and Farrell ZH Comp… 81466667. Fully electric … 0
6 Harmon, Edwards and Bates ZH Comp… 75070435. Discount superm… 0
7 Punjab s Marine conservati… Riodel… Comp… 72167572. Beef, pork, chi… 0
8 Assam Limited Liability … Utopor… Comp… 72162317. Power and Gas s… 0
9 Ianira Starfish Sagl Import Rio Is… Comp… 68832979. Light commercia… 0
10 Moran, Lewis and Jimenez ZH Comp… 65592906. Automobiles, tr… 0
# ℹ 27,612 more rows
Tokenisation is the process of breaking up a given text into units called tokens. Tokens can be individual words, phrases or even whole sentences. In the process of tokenisation, some characters like punctuation marks may be discarded. The tokens usually become the input for the processes like parsing and text mining.
In the code chunk below, unnest_token() of tidytext is used to split text in product_services field into words.
p <- token_nodes %>%count(word, sort =TRUE) %>%top_n(15) %>%mutate(word =reorder(word, n)) %>%ggplot(aes(x = word, y = n)) +geom_col() +xlab(NULL) +coord_flip() +labs(x ="Count",y ="Unique words",title ="Count of unique words found in product_services field") +theme(plot.background =element_rect(fill ="seashell"))ggplotly(p)
The bar chart reveals that the unique words contains some words that may not be useful to use. For instance “a” and “to”. In the word of text mining we call those words stop words. You want to remove these words from your analysis as they are fillers used to compose a sentence.
The tidytext package has a function called stop_words that will help us clean up stop words.
# A tibble: 51 × 2
word n
<chr> <int>
1 products 1860
2 fish 740
3 seafood 622
4 frozen 467
5 services 429
6 food 345
7 related 329
8 equipment 309
9 fresh 276
10 salmon 252
# ℹ 41 more rows
A Glimpse into the Code
set.seed(1234)filtered_words %>%count(word) %>%with(wordcloud(word, n, max.words =50))
A Glimpse into the Code
p <- filtered_words %>%count(word, sort =TRUE) %>%top_n(15) %>%mutate(word =reorder(word, n)) %>%ggplot(aes(x = word, y = n)) +geom_col() +xlab(NULL) +coord_flip() +labs(x ="Count",y ="Unique words",title ="Count of unique words found in product_services field") +theme(plot.background =element_rect(fill ="seashell"))ggplotly(p)
Betweenness Centrality on top 15 Words
In our nodes dataset, we have a unique column named ‘product_services’, which isn’t available in the edges dataset. To perform our analysis, we need to consider 15 specific words having highest count and identify the nodes where these words are mentioned in the ‘product_services’ column.
After identifying and filtering these particular nodes, we’ll utilize them as a reference for filtering our edges dataset. Specifically, we’ll only keep the edges where the ‘source’ or ‘target’ matches with the ID of our filtered nodes. This method allows us to create a network subset that’s related to our specific words from the ‘product_services’ column.
A Glimpse into the Code
top_words <-c("products", "fish", "seafood", "frozen", "services", "food", "related", "equipment", "fresh", "salmon", "accessories", "materials", "systems", "freight") # Filtering nodes that contain the top words in the product_services columnMC3_NodesFilter <- MC3_Nodes %>%filter(str_detect(product_services, paste(top_words, collapse ="|")))# Filtering edges where the source or target is in the filtered nodes# Filtering edges where the source or target is in the filtered nodesMC3_EdgeFilter <- MC3_Edges %>%filter(source %in% MC3_NodesFilter$id | target %in% MC3_NodesFilter$id)
Betweenness centrality measures the number of times a node acts as a bridge along the shortest path between two other nodes. It is useful for identifying nodes that serve as a connector or broker within a network. In illegal fishing, a node with high betweenness centrality might represent a key intermediary, such as a specific ship or company that’s heavily involved in transporting or selling illegal catch.
The violin plot visualizes the distribution of a numerical variable (revenue_omu) across different categories (type). It provides information on the central tendency, variability, and distributional shape of the revenue data for each type.
A Glimpse into the Code
p <-ggplot(MC3_Nodes1, aes(x = type, y = revenue_omu)) +geom_violin(trim =FALSE) +labs(x ="Type", y ="Revenue OMU") +theme(plot.background =element_rect(fill ="seashell")) +scale_y_continuous(labels = scales::comma) +coord_flip()plotly::ggplotly(p)
Additionally, created another violin plot specifically for ‘beneficial owner’ type because it has more revenue than the rest, this would allow a more detailed examination of the revenue distribution for this specific type.
A Glimpse into the Code
# Filter dataMC3_Nodes1_filtered <- MC3_Nodes1 %>%filter(type %in%c("Beneficial Owner"))# Create the violin plotp <-ggplot(MC3_Nodes1_filtered, aes(x = type, y = revenue_omu)) +geom_violin(trim =FALSE) +labs(x ="Type", y ="Revenue OMU") +theme(plot.background =element_rect(fill ="seashell")) +scale_y_continuous(labels = scales::comma) +coord_flip()# Convert to interactive plotplotly::ggplotly(p)
Recommendations, Limitations and Takeaways
RECOMMENDATIONS
Deep Dive into Entities with High Degrees: Given the sparsity of the network, entities with higher degrees can be seen as significant connectors. A deeper dive into these entities could provide more valuable insights. What type of entities are they? How do they connect different parts of the network? What role do they play in the context of fishing business and potential IUU activities?
Country-Specific Analysis: Given the concentration of nodes in a few countries (especially ZH), it would be valuable to conduct a more detailed country-specific analysis. Understanding the specific roles these countries play in the network and how their large presence impacts the dynamics of the entire network could provide valuable insights.
Revenue-Based Analysis: The treemap visualization and violin plots provided valuable insights into the revenue patterns across different companies and types of entities. A more detailed revenue-based analysis could be performed, exploring the relationship between revenue and other attributes or network properties.
LIMITATIONS
Network Sparsity: The network appears to be quite sparse, potentially indicating a disconnected network structure. This might present challenges in identifying broad structural anomalies or overarching patterns.
`Limited Attributes for Analysis:The lack of attributes limit the depth and breadth of the analysis. For example, attributes related to the nature and volume of fishing activities, legal status, historical data, etc., could have provided additional dimensions for analysis.
KEY TAKEAWAYS
Significance of Network Measures: Network measures such as degree and betweenness centrality can provide valuable insights into the roles and importance of nodes within a network. High-degree nodes and nodes with high betweenness centrality can be of particular interest in the context of IUU fishing activities.
Role of Textual Data: The analysis also highlighted the potential of textual data. The use of specific words in the ‘product_services’ column allowed for a more targeted analysis and extraction of a relevant subset of the network.
Importance of Revenue Analysis: The analysis of revenue data revealed patterns and anomalies that can be indicative of potential IUU activities. Companies generating disproportionately high revenues and the revenue patterns of specific types of entities are worth further investigation.